About This Notebook

This notebook demonstrate how to use ML Workbench to create a regression model that accepts numeric and categorical data. This one shows "cloud run" mode, which does each step in Google Cloud Platform with various services. Cloud run can be distributed so it can handle large data without being restricted on memory, computation, or disk limits. The notebook is similar to last one (Taxi Fare Model (small data)), but it uses full data (about 77M instances).

There are only a few things that need to change between "local run" and "cloud run":

all data sources or file paths must be on GCS.
the --cloud flag must be set for each step.
"cloud_config" can be set for cloud specific settings, such as project_id, machine_type. In some cases it is required.

Other than this, nothing else changes from local to cloud!

Note: "Run all cells" does not work for this notebook because the steps are asynchonous. In many steps it submits a cloud job, and you should track the status by following the job link.

Execution of this notebook requires Google Datalab (see setup instructions).

The Data

We will use Chicago Taxi Trip Data. Using pickup location, drop off location, taxi company, the model we will build predicts the trip fare.

Split Data Into Train/Eval Sets

Use bigquery to select the features we need and also randomly choose 5% for eval, 95% for training.



In [27]:

    
%%bq query --name texi_query_eval
SELECT
  unique_key,
  fare,
  CAST(EXTRACT(DAYOFWEEK FROM trip_start_timestamp) AS STRING) as weekday,
  CAST(EXTRACT(DAYOFYEAR FROM trip_start_timestamp) AS STRING) as day,
  CAST(EXTRACT(HOUR FROM trip_start_timestamp) AS STRING) as hour,
  pickup_latitude,
  pickup_longitude,
  dropoff_latitude,
  dropoff_longitude,
  company
FROM `bigquery-public-data.chicago_taxi_trips.taxi_trips`
WHERE 
  fare > 2.0 AND fare < 200.0 AND
  pickup_latitude IS NOT NULL AND
  pickup_longitude IS NOT NULL AND
  dropoff_latitude IS NOT NULL AND
  dropoff_longitude IS NOT NULL AND
  MOD(ABS(FARM_FINGERPRINT(unique_key)), 100) < 5



In [28]:

    
%%bq query --name texi_query_train
SELECT
  unique_key,
  fare,
  CAST(EXTRACT(DAYOFWEEK FROM trip_start_timestamp) AS STRING) as weekday,
  CAST(EXTRACT(DAYOFYEAR FROM trip_start_timestamp) AS STRING) as day,
  CAST(EXTRACT(HOUR FROM trip_start_timestamp) AS STRING) as hour,
  pickup_latitude,
  pickup_longitude,
  dropoff_latitude,
  dropoff_longitude,
  company
FROM `bigquery-public-data.chicago_taxi_trips.taxi_trips`
WHERE 
  fare > 2.0 AND fare < 200.0 AND
  pickup_latitude IS NOT NULL AND
  pickup_longitude IS NOT NULL AND
  dropoff_latitude IS NOT NULL AND
  dropoff_longitude IS NOT NULL AND
  MOD(ABS(FARM_FINGERPRINT(unique_key)), 100) >= 5

Create "chicago_taxi.train" and "chicago_taxi.eval" BQ tables to store results.



In [29]:

    
%%bq datasets create --name chicago_taxi



In [30]:

    
%%bq execute
query: texi_query_eval
table: chicago_taxi.eval
mode: overwrite









    





          
          
          






    Out[30]:





    unique_key fare weekday day hour pickup_latitude pickup_longitude dropoff_latitude dropoff_longitude company
2bc572255bcaa2389a211f282a7291916ed3da07 5.45 1 313 7 41.934762456 -87.639853859 41.936310131 -87.651562592 
5d5fcecab5b21f369daab6c08f909fec5eef39f7 3.25 6 3 21 41.946294536 -87.654298084 41.946294536 -87.654298084 
abb241ae6ed453a54fa9761f99b109272c0af770 10.65 7 74 13 41.89967018 -87.669837798 41.921778356 -87.641459759 Dispatch Taxi Affiliation
de3b3be03f4428b542c5abe8afd52545e72d9cbe 4.65 4 184 19 41.906025969 -87.675311622 41.906025969 -87.675311622 Taxi Affiliation Services
583aeed939da4b9cd39746eed19e96a1c8c0b185 17.25 3 41 12 41.808916283 -87.596183344 41.706587882 -87.623366512 Taxi Affiliation Services
c60a2ab89b7d57fec060b5a6b8fa1c141e1540bd 7.65 7 130 3 41.946294536 -87.654298084 41.921854911 -87.646210977 Taxi Affiliation Services
4ec0e84280d13fce4efb3a035297ac0aeb1fa5bb 5.05 1 321 0 41.921854911 -87.646210977 41.93057857 -87.642206313 Taxi Affiliation Services
7cd32a624a19e513615a142e6e485da0814c9022 7.05 7 53 3 41.965445784 -87.66319585 41.946294536 -87.654298084 
34746e341940dce3f90e98873ed2d54cd5d1515d 28.05 1 54 2 41.946294536 -87.654298084 41.945069205 -87.67606274 
549a108a9579646b1d1fb674866d93873bfa4875 5.45 6 291 19 41.965445784 -87.66319585 41.972667956 -87.663865496 
c775f7599eccdecd422456099dea5714d1a8c2a7 8.65 1 118 1 41.928431564 -87.699968591 41.906025969 -87.675311622 
f45daa290893f86198aae73365887dd5682ba4f8 4.25 1 54 4 41.958055933 -87.660389456 41.965445784 -87.66319585 Dispatch Taxi Affiliation
59ab7a8d5b76d5b1e319ab1fbdece667eb7e4b5c 8.85 7 75 21 41.921778356 -87.641459759 41.921273105 -87.68508211 Taxi Affiliation Services
5f32076aac90efddfe968b27215b1ce70c96837e 5.45 7 326 1 41.794090253 -87.592310855 41.794090253 -87.592310855 
26be997dec6bbedbc4bf9c501ea721ecc9414ffc 4.45 7 144 23 41.950545696 -87.676182496 41.957843375 -87.676373281 
79605fa02f425d36ab45f24648269ecae519c2a7 8.05 7 138 18 41.943155086 -87.640698076 41.935988906 -87.670966384 
8f978edb00ada6b7ec748a61049730f2b34985a6 8.85 7 354 20 41.945170453 -87.668794439 41.972667956 -87.663865496 Blue Ribbon Taxi Association Inc.
f066faa6c6e190a7f0e46a30d46e8d5e9d4c09f3 14.05 6 45 19 41.9725808 -87.694001061 41.906025969 -87.675311622 
8e294dd8e5423956becb52242b4e06947b24e1e7 4.65 1 200 2 41.949139771 -87.656803909 41.962178629 -87.645378762 Taxi Affiliation Services
e994354ae1e87314d5fe89c99b7b86c2ca120b2c 7.85 3 175 6 41.916005274 -87.675095116 41.972437081 -87.671109526 
718786093b94c14f22b7d83932841826109e0608 5.65 1 145 3 42.009622881 -87.670166857 42.001571027 -87.695012589 Taxi Affiliation Services
1b32672d8892fa920b139c562e12e0e3cad08862 6.25 3 190 23 41.942691844 -87.651770507 41.929077655 -87.646293476 Choice Taxi Association
0af210da5de585e781b94d55b9c3137e536fd736 7.05 7 290 19 41.906025969 -87.675311622 41.87866742 -87.671653621 
886e5c156e49a584f5709d7f02dbe0ed43917adb 13.05 5 331 10 41.878594358 -87.730232428 41.878594358 -87.730232428 Taxi Affiliation Services
b40f10ff61edc4de680001d11b314b39fbaa9629 6.25 6 101 12 41.829922304 -87.672502646 41.829922304 -87.672502646 Choice Taxi Association
    
(rows: 3585149, time: 51.7s,     9GB processed, job: job_BPas0uNDL2FpMAydl51JoD1V3CED)



In [31]:

    
%%bq execute
query: texi_query_train
table: chicago_taxi.train
mode: overwrite









    





          
          
          






    Out[31]:





    unique_key fare weekday day hour pickup_latitude pickup_longitude dropoff_latitude dropoff_longitude company
7bc601797a07c11ac351a49d02850789acda94b1 35.25 4 203 22 41.97907082 -87.903039661 41.890922026 -87.618868355 Taxi Affiliation Services
a2c7b99420515793e18c2cf896963edc1dbaafff 38.05 1 230 13 41.785998518 -87.750934289 41.949139771 -87.656803909 Taxi Affiliation Services
82a1dfb248c4cb6bdede9fbe6a50d2fd2d6043dc 8.25 4 16 8 41.899602111 -87.633308037 41.953582125 -87.72345239 
d3131be93213ce27b6f5c895b7400b2b80bb48a5 8.65 5 276 20 41.904935302 -87.649907226 41.880994471 -87.632746489 Taxi Affiliation Services
b95cff84b984569fdab6a91ec8b0233656aafa3b 9.85 4 225 17 41.880994471 -87.632746489 41.849246754 -87.624135298 Taxi Affiliation Services
cd1641a6062ada8f3019500e5b71e8a6f5edca2d 46.65 5 127 13 42.001571027 -87.695012589 41.983636307 -87.723583185 Taxi Affiliation Services
94977a09ce63010d0edb5692beb4643161fb4a87 6.85 4 71 21 41.884987192 -87.620992913 41.867902418 -87.642958665 Taxi Affiliation Services
98fce58c13ae616b787a93a5a1df7156077a7306 24.45 5 220 20 41.785998518 -87.750934289 41.89321636 -87.63784421 Northwest Management LLC
d6288f216a2d82fbd43a311d1cf70260940dcd78 5.05 4 99 22 41.942691844 -87.651770507 41.93057857 -87.642206313 
58fd7078712ed5a1cea6a87289212af5747c20ea 36.05 3 218 12 41.880994471 -87.632746489 41.97907082 -87.903039661 Taxi Affiliation Services
8aa56eec78409be1652a9b83a4f3e070731f92e9 4.84 7 54 0 41.890922026 -87.618868355 41.892507781 -87.626214906 
827dbd61d7a4499fa2a61f385c4e8333ed823a5e 5.05 5 316 16 41.902788048 -87.62614559 41.90156691 -87.638404012 
f38715ad77adcd5acf6845f710f0deaa5418cf25 13.05 5 135 21 41.878865584 -87.625192142 41.901206994 -87.676355989 Dispatch Taxi Affiliation
2fd1b1f9cd7ea5b2906436048020b29962fcbb10 4.84 2 33 14 41.884987192 -87.620992913 41.884987192 -87.620992913 
fee9d6f95e140ce67caf1a886adaf9eff7c12453 10.05 5 23 19 41.880994471 -87.632746489 41.914585709 -87.645966207 Taxi Affiliation Services
913f60022795a4f743e244bad7d45de8cdd0724f 5.45 2 308 20 41.89321636 -87.63784421 41.89321636 -87.63784421 KOAM Taxi Association
3ab08ea6f7fa217191616d8a961250e7ff15c00f 4.45 6 177 11 41.880994471 -87.632746489 41.884987192 -87.620992913 Taxi Affiliation Services
a808620e2671265062b6532c0a4ae91fe0d9c237 7.25 6 200 17 41.922686284 -87.649488729 41.944226601 -87.655998182 Dispatch Taxi Affiliation
6bd8607cc0ffac63a0f87601bc14851a6c9924a5 15.05 1 221 20 41.899602111 -87.633308037 41.9867118 -87.663416405 Choice Taxi Association
ee2e429040b251cce7a4974b81f222c32746e16f 16.85 7 10 2 41.947791586 -87.683834942 41.874005383 -87.66351755 
076b03e7d42357350d78c4cef875b6b41957cdba 7.75 7 191 23 41.877406123 -87.621971652 41.892072635 -87.628874157 Taxi Affiliation Services
4bb3b161ee381ddbc6ad3f1a2af88af575863f35 6.05 2 300 0 41.922686284 -87.649488729 41.944226601 -87.655998182 Taxi Affiliation Services
14f397b18ffb5a3e6f84b59d88bc6a655cc46409 13.45 5 255 8 41.914616286 -87.631717366 41.88528132 -87.6572332 
91e572f071c93228b93fc9bef66f71c28eb7ecf2 6.05 3 344 9 41.892072635 -87.628874157 41.880994471 -87.632746489 
f5636cb8d64a808666af7721c43279830d1d4e1b 9.25 6 207 18 41.899155613 -87.626210532 41.880994471 -87.632746489 Taxi Affiliation Services
    
(rows: 68126775, time: 66.1s,     9GB processed, job: job_XY_3EdOUA4H4U-H1mmrmxnjGNXgw)

Sanity check on the data.



In [32]:

    
%%bq query
SELECT count(*) FROM chicago_taxi.train









    





          
          
          






    Out[32]:





    f0_
68126775
    
(rows: 1, time: 1.7s,     0B processed, job: job_6KM__IJkMn19rntxw102Zo5365PA)



In [10]:

    
%%bq query
SELECT count(*) FROM chicago_taxi.eval









    





          
          
          






    Out[10]:





    f0_
3585149
    
(rows: 1, time: 0.5s, cached, job: job_zNwX2dEzvBNHEC3TN36z6IcFs7Th)

Explore Data

See previous notebook (Taxi Fare Model (small data)) for data exploration.

Create Model with ML Workbench

The MLWorkbench Magics are a set of Datalab commands that allow an easy code-free experience to training, deploying, and predicting ML models. This notebook will take the data in BigQuery tables and build a regression model. The MLWorkbench Magics are a collection of magic commands for each step in ML workflows: analyzing input data to build transforms, transforming data, training a model, evaluating a model, and deploying a model.

For details of each command, run with --help. For example, "%%ml train --help".

This notebook will run the analyze, transform, and training steps in cloud with services. Notice the "--cloud" flag is set for each step.



In [3]:

    
import google.datalab.contrib.mlworkbench.commands # this loads the %%ml commands



In [35]:

    
%%ml dataset create
name: taxi_data_full
format: bigquery
train: chicago_taxi.train
eval: chicago_taxi.eval



In [ ]:

    
!gsutil mb gs://datalab-chicago-taxi-demo # Create a Storage Bucket to store results.

Step 1: Analyze

The first step in the MLWorkbench workflow is to analyze the data for the requested transformations. Analysis in this case builds vocabulary for categorical features, and compute numeric stats for numeric features.



In [ ]:

    
!gsutil rm -r -f gs://datalab-chicago-taxi-demo/analysis # Remove previous analysis results if any



In [38]:

    
%%ml analyze --cloud
output: gs://datalab-chicago-taxi-demo/analysis
data: taxi_data_full
features:
  unique_key:
    transform: key
  fare:
    transform: target         
  company:
    transform: embedding
    embedding_dim: 10
  weekday:
    transform: one_hot
  day:
    transform: one_hot
  hour:
    transform: one_hot
  pickup_latitude:
    transform: scale    
  pickup_longitude:
    transform: scale
  dropoff_latitude:
    transform: scale
  dropoff_longitude:
    transform: scale









    





          
          
          






    



Analyzing column fare...
Updated property [core/project].
column fare analyzed.
Analyzing column hour...
Updated property [core/project].
column hour analyzed.
Analyzing column company...
Updated property [core/project].
column company analyzed.
Analyzing column pickup_longitude...
Updated property [core/project].
column pickup_longitude analyzed.
Analyzing column day...
Updated property [core/project].
column day analyzed.
Analyzing column dropoff_longitude...
Updated property [core/project].
column dropoff_longitude analyzed.
Analyzing column weekday...
Updated property [core/project].
column weekday analyzed.
Analyzing column pickup_latitude...
Updated property [core/project].
column pickup_latitude analyzed.
Analyzing column dropoff_latitude...
Updated property [core/project].
column dropoff_latitude analyzed.
Updated property [core/project].

Step 2: Transform

The transform step performs some transformations on the input data and saves the results to a special TensorFlow file called a TFRecord file containing TF.Example protocol buffers. This allows training to start from preprocessed data. If this step is not used, training would have to perform the same preprocessing on every row of csv data every time it is used. As TensorFlow reads the same data row multiple times during training, this means the same row would be preprocessed multiple times. By writing the preprocessed data to disk, we can speed up training.

The transform is required if your source data is in BigQuery table.

We run the transform step for the training and eval data.



In [ ]:

    
!gsutil -m rm -r -f gs://datalab-chicago-taxi-demo/transform # Remove previous transform results if any.

Transform takes about 6 hours in cloud. Data is fairely big (33GB) and processing locally on a single VM would be much longer.



In [40]:

    
%%ml transform --cloud
output: gs://datalab-chicago-taxi-demo/transform
analysis: gs://datalab-chicago-taxi-demo/analysis
data: taxi_data_full









    





          
          
          






    



/usr/local/lib/python2.7/dist-packages/apache_beam/coders/typecoders.py:135: UserWarning: Using fallback coder for typehint: Any.
  warnings.warn('Using fallback coder for typehint: %r.' % typehint)
running sdist
running egg_info
writing requirements to trainer.egg-info/requires.txt
writing trainer.egg-info/PKG-INFO
writing top-level names to trainer.egg-info/top_level.txt
writing dependency_links to trainer.egg-info/dependency_links.txt
reading manifest file 'trainer.egg-info/SOURCES.txt'
writing manifest file 'trainer.egg-info/SOURCES.txt'
warning: sdist: standard file not found: should have one of README, README.rst, README.txt, README.md

running check
warning: check: missing required meta-data: url

creating trainer-1.0.0
creating trainer-1.0.0/trainer
creating trainer-1.0.0/trainer.egg-info
copying files to trainer-1.0.0...
copying setup.py -> trainer-1.0.0
copying trainer/__init__.py -> trainer-1.0.0/trainer
copying trainer/feature_analysis.py -> trainer-1.0.0/trainer
copying trainer/feature_transforms.py -> trainer-1.0.0/trainer
copying trainer/task.py -> trainer-1.0.0/trainer
copying trainer.egg-info/PKG-INFO -> trainer-1.0.0/trainer.egg-info
copying trainer.egg-info/SOURCES.txt -> trainer-1.0.0/trainer.egg-info
copying trainer.egg-info/dependency_links.txt -> trainer-1.0.0/trainer.egg-info
copying trainer.egg-info/requires.txt -> trainer-1.0.0/trainer.egg-info
copying trainer.egg-info/top_level.txt -> trainer-1.0.0/trainer.egg-info
Writing trainer-1.0.0/setup.cfg
Creating tar archive
removing 'trainer-1.0.0' (and everything under it)
DEPRECATION: pip install --download has been deprecated and will be removed in the future. Pip now has a download command that should be used instead.
Collecting google-cloud-dataflow==2.0.0
  Downloading google-cloud-dataflow-2.0.0.tar.gz (576kB)
  Saved /tmp/tmp0aFKBX/google-cloud-dataflow-2.0.0.tar.gz
Successfully downloaded google-cloud-dataflow
/usr/local/lib/python2.7/dist-packages/apache_beam/io/gcp/gcsio.py:113: DeprecationWarning: object() takes no parameters
  super(GcsIO, cls).__new__(cls, storage_client))
View job at https://console.developers.google.com/dataflow/job/2017-10-27_14_53_34-6842464010721369152?project=bradley-playground
/usr/local/lib/python2.7/dist-packages/apache_beam/coders/typecoders.py:135: UserWarning: Using fallback coder for typehint: Any.
  warnings.warn('Using fallback coder for typehint: %r.' % typehint)
running sdist
running egg_info
writing requirements to trainer.egg-info/requires.txt
writing trainer.egg-info/PKG-INFO
writing top-level names to trainer.egg-info/top_level.txt
writing dependency_links to trainer.egg-info/dependency_links.txt
reading manifest file 'trainer.egg-info/SOURCES.txt'
writing manifest file 'trainer.egg-info/SOURCES.txt'
warning: sdist: standard file not found: should have one of README, README.rst, README.txt, README.md

running check
warning: check: missing required meta-data: url

creating trainer-1.0.0
creating trainer-1.0.0/trainer
creating trainer-1.0.0/trainer.egg-info
copying files to trainer-1.0.0...
copying setup.py -> trainer-1.0.0
copying trainer/__init__.py -> trainer-1.0.0/trainer
copying trainer/feature_analysis.py -> trainer-1.0.0/trainer
copying trainer/feature_transforms.py -> trainer-1.0.0/trainer
copying trainer/task.py -> trainer-1.0.0/trainer
copying trainer.egg-info/PKG-INFO -> trainer-1.0.0/trainer.egg-info
copying trainer.egg-info/SOURCES.txt -> trainer-1.0.0/trainer.egg-info
copying trainer.egg-info/dependency_links.txt -> trainer-1.0.0/trainer.egg-info
copying trainer.egg-info/requires.txt -> trainer-1.0.0/trainer.egg-info
copying trainer.egg-info/top_level.txt -> trainer-1.0.0/trainer.egg-info
Writing trainer-1.0.0/setup.cfg
Creating tar archive
removing 'trainer-1.0.0' (and everything under it)
DEPRECATION: pip install --download has been deprecated and will be removed in the future. Pip now has a download command that should be used instead.
Collecting google-cloud-dataflow==2.0.0
  Using cached google-cloud-dataflow-2.0.0.tar.gz
  Saved /tmp/tmpoTcybt/google-cloud-dataflow-2.0.0.tar.gz
Successfully downloaded google-cloud-dataflow
/usr/local/lib/python2.7/dist-packages/apache_beam/io/gcp/gcsio.py:113: DeprecationWarning: object() takes no parameters
  super(GcsIO, cls).__new__(cls, storage_client))
View job at https://console.developers.google.com/dataflow/job/2017-10-27_14_53_46-5094631904622372762?project=bradley-playground



In [5]:

    
!gsutil list gs://datalab-chicago-taxi-demo/transform/eval-*









    





          
          
          






    



gs://datalab-chicago-taxi-demo/transform/eval-00000-of-00056.tfrecord.gz
gs://datalab-chicago-taxi-demo/transform/eval-00001-of-00056.tfrecord.gz
gs://datalab-chicago-taxi-demo/transform/eval-00002-of-00056.tfrecord.gz
gs://datalab-chicago-taxi-demo/transform/eval-00003-of-00056.tfrecord.gz
gs://datalab-chicago-taxi-demo/transform/eval-00004-of-00056.tfrecord.gz
gs://datalab-chicago-taxi-demo/transform/eval-00005-of-00056.tfrecord.gz
gs://datalab-chicago-taxi-demo/transform/eval-00006-of-00056.tfrecord.gz
gs://datalab-chicago-taxi-demo/transform/eval-00007-of-00056.tfrecord.gz
gs://datalab-chicago-taxi-demo/transform/eval-00008-of-00056.tfrecord.gz
gs://datalab-chicago-taxi-demo/transform/eval-00009-of-00056.tfrecord.gz
gs://datalab-chicago-taxi-demo/transform/eval-00010-of-00056.tfrecord.gz
gs://datalab-chicago-taxi-demo/transform/eval-00011-of-00056.tfrecord.gz
gs://datalab-chicago-taxi-demo/transform/eval-00012-of-00056.tfrecord.gz
gs://datalab-chicago-taxi-demo/transform/eval-00013-of-00056.tfrecord.gz
gs://datalab-chicago-taxi-demo/transform/eval-00014-of-00056.tfrecord.gz
gs://datalab-chicago-taxi-demo/transform/eval-00015-of-00056.tfrecord.gz
gs://datalab-chicago-taxi-demo/transform/eval-00016-of-00056.tfrecord.gz
gs://datalab-chicago-taxi-demo/transform/eval-00017-of-00056.tfrecord.gz
gs://datalab-chicago-taxi-demo/transform/eval-00018-of-00056.tfrecord.gz
gs://datalab-chicago-taxi-demo/transform/eval-00019-of-00056.tfrecord.gz
gs://datalab-chicago-taxi-demo/transform/eval-00020-of-00056.tfrecord.gz
gs://datalab-chicago-taxi-demo/transform/eval-00021-of-00056.tfrecord.gz
gs://datalab-chicago-taxi-demo/transform/eval-00022-of-00056.tfrecord.gz
gs://datalab-chicago-taxi-demo/transform/eval-00023-of-00056.tfrecord.gz
gs://datalab-chicago-taxi-demo/transform/eval-00024-of-00056.tfrecord.gz
gs://datalab-chicago-taxi-demo/transform/eval-00025-of-00056.tfrecord.gz
gs://datalab-chicago-taxi-demo/transform/eval-00026-of-00056.tfrecord.gz
gs://datalab-chicago-taxi-demo/transform/eval-00027-of-00056.tfrecord.gz
gs://datalab-chicago-taxi-demo/transform/eval-00028-of-00056.tfrecord.gz
gs://datalab-chicago-taxi-demo/transform/eval-00029-of-00056.tfrecord.gz
gs://datalab-chicago-taxi-demo/transform/eval-00030-of-00056.tfrecord.gz
gs://datalab-chicago-taxi-demo/transform/eval-00031-of-00056.tfrecord.gz
gs://datalab-chicago-taxi-demo/transform/eval-00032-of-00056.tfrecord.gz
gs://datalab-chicago-taxi-demo/transform/eval-00033-of-00056.tfrecord.gz
gs://datalab-chicago-taxi-demo/transform/eval-00034-of-00056.tfrecord.gz
gs://datalab-chicago-taxi-demo/transform/eval-00035-of-00056.tfrecord.gz
gs://datalab-chicago-taxi-demo/transform/eval-00036-of-00056.tfrecord.gz
gs://datalab-chicago-taxi-demo/transform/eval-00037-of-00056.tfrecord.gz
gs://datalab-chicago-taxi-demo/transform/eval-00038-of-00056.tfrecord.gz
gs://datalab-chicago-taxi-demo/transform/eval-00039-of-00056.tfrecord.gz
gs://datalab-chicago-taxi-demo/transform/eval-00040-of-00056.tfrecord.gz
gs://datalab-chicago-taxi-demo/transform/eval-00041-of-00056.tfrecord.gz
gs://datalab-chicago-taxi-demo/transform/eval-00042-of-00056.tfrecord.gz
gs://datalab-chicago-taxi-demo/transform/eval-00043-of-00056.tfrecord.gz
gs://datalab-chicago-taxi-demo/transform/eval-00044-of-00056.tfrecord.gz
gs://datalab-chicago-taxi-demo/transform/eval-00045-of-00056.tfrecord.gz
gs://datalab-chicago-taxi-demo/transform/eval-00046-of-00056.tfrecord.gz
gs://datalab-chicago-taxi-demo/transform/eval-00047-of-00056.tfrecord.gz
gs://datalab-chicago-taxi-demo/transform/eval-00048-of-00056.tfrecord.gz
gs://datalab-chicago-taxi-demo/transform/eval-00049-of-00056.tfrecord.gz
gs://datalab-chicago-taxi-demo/transform/eval-00050-of-00056.tfrecord.gz
gs://datalab-chicago-taxi-demo/transform/eval-00051-of-00056.tfrecord.gz
gs://datalab-chicago-taxi-demo/transform/eval-00052-of-00056.tfrecord.gz
gs://datalab-chicago-taxi-demo/transform/eval-00053-of-00056.tfrecord.gz
gs://datalab-chicago-taxi-demo/transform/eval-00054-of-00056.tfrecord.gz
gs://datalab-chicago-taxi-demo/transform/eval-00055-of-00056.tfrecord.gz



In [5]:

    
%%ml dataset create
name: taxi_data_transformed
format: transformed
train: gs://datalab-chicago-taxi-demo/transform/train-*
eval: gs://datalab-chicago-taxi-demo/transform/eval-*

Step 3: Training

MLWorkbench help build standard TensorFlow models without you having to write any TensorFlow code. We already know from last notebook that DNN regression model works better.



In [ ]:

    
!gsutil -m rm -r -f gs://datalab-chicago-taxi-demo/train # Remove previous training results.

Training takes about 30 min with "STANRDARD_1" scale_tier. Note that we will perform 1M steps. This will take much longer if we run it locally on Datalab's VM. With CloudML Engine, it runs training in a distributed way with multiple VMs, so it runs much faster.



In [6]:

    
%%ml train --cloud
output: gs://datalab-chicago-taxi-demo/train
analysis: gs://datalab-chicago-taxi-demo/analysis
data: taxi_data_transformed
model_args:
    model: dnn_regression
    hidden-layer-size1: 400
    hidden-layer-size2: 200
    train-batch-size: 1000
    max-steps: 1000000
cloud_config:
    region: us-east1
    scale_tier: STANDARD_1









    





          
          
          






    




Job "trainer_task_171028_154103" submitted.Click here to view cloud log. 







    




TensorBoard was started successfully with pid 13979. Click here to access it.

Step 4: Evaluation using batch prediction

Below, we use the evaluation model and run batch prediction in cloud. For demo purpose, we will use the evaluation data again.



In [ ]:

    
# Delete previous results
!gsutil -m rm -r gs://datalab-chicago-taxi-demo/batch_prediction

Currently, batch_prediction service does not work with BigQuery data. So we export eval data to csv file.



In [9]:

    
%%bq extract
table: chicago_taxi.eval
format: csv
path: gs://datalab-chicago-taxi-demo/eval.csv

Run batch prediction. Note that we use evaluation_model because it takes input data with target (truth) column.



In [8]:

    
%%ml batch_predict --cloud
model: gs://datalab-chicago-taxi-demo/train/evaluation_model
output: gs://datalab-chicago-taxi-demo/batch_prediction
format: csv
data:
  csv: gs://datalab-chicago-taxi-demo/eval.csv
cloud_config:
  region: us-east1









    





          
          
          






    




Job "prediction_171028_180359" submitted.Click here to view cloud log.

Once batch prediction is done, check results files. Batch prediction service outputs to JSON files.



In [14]:

    
!gsutil list -l -h gs://datalab-chicago-taxi-demo/batch_prediction









    





          
          
          






    



       0 B  2017-10-28T04:23:12Z  gs://datalab-chicago-taxi-demo/batch_prediction/prediction.errors_stats-00000-of-00001
       0 B  2017-10-28T04:10:44Z  gs://datalab-chicago-taxi-demo/batch_prediction/prediction.results-00000-of-00001
 19.74 MiB  2017-10-28T04:23:11Z  gs://datalab-chicago-taxi-demo/batch_prediction/prediction.results-00000-of-00022
 19.76 MiB  2017-10-28T04:23:11Z  gs://datalab-chicago-taxi-demo/batch_prediction/prediction.results-00001-of-00022
 19.79 MiB  2017-10-28T04:23:11Z  gs://datalab-chicago-taxi-demo/batch_prediction/prediction.results-00002-of-00022
 19.76 MiB  2017-10-28T04:23:11Z  gs://datalab-chicago-taxi-demo/batch_prediction/prediction.results-00003-of-00022
 19.86 MiB  2017-10-28T04:23:11Z  gs://datalab-chicago-taxi-demo/batch_prediction/prediction.results-00004-of-00022
 19.81 MiB  2017-10-28T04:23:11Z  gs://datalab-chicago-taxi-demo/batch_prediction/prediction.results-00005-of-00022
 19.87 MiB  2017-10-28T04:23:11Z  gs://datalab-chicago-taxi-demo/batch_prediction/prediction.results-00006-of-00022
 19.82 MiB  2017-10-28T04:23:11Z  gs://datalab-chicago-taxi-demo/batch_prediction/prediction.results-00007-of-00022
 19.65 MiB  2017-10-28T04:23:11Z  gs://datalab-chicago-taxi-demo/batch_prediction/prediction.results-00008-of-00022
  19.9 MiB  2017-10-28T04:23:11Z  gs://datalab-chicago-taxi-demo/batch_prediction/prediction.results-00009-of-00022
 19.88 MiB  2017-10-28T04:23:11Z  gs://datalab-chicago-taxi-demo/batch_prediction/prediction.results-00010-of-00022
 19.86 MiB  2017-10-28T04:23:11Z  gs://datalab-chicago-taxi-demo/batch_prediction/prediction.results-00011-of-00022
 19.75 MiB  2017-10-28T04:23:11Z  gs://datalab-chicago-taxi-demo/batch_prediction/prediction.results-00012-of-00022
 19.73 MiB  2017-10-28T04:23:11Z  gs://datalab-chicago-taxi-demo/batch_prediction/prediction.results-00013-of-00022
 19.74 MiB  2017-10-28T04:23:11Z  gs://datalab-chicago-taxi-demo/batch_prediction/prediction.results-00014-of-00022
 19.76 MiB  2017-10-28T04:23:11Z  gs://datalab-chicago-taxi-demo/batch_prediction/prediction.results-00015-of-00022
  8.91 MiB  2017-10-28T04:23:11Z  gs://datalab-chicago-taxi-demo/batch_prediction/prediction.results-00016-of-00022
  5.93 MiB  2017-10-28T04:23:11Z  gs://datalab-chicago-taxi-demo/batch_prediction/prediction.results-00017-of-00022
 10.88 MiB  2017-10-28T04:23:11Z  gs://datalab-chicago-taxi-demo/batch_prediction/prediction.results-00018-of-00022
 19.92 MiB  2017-10-28T04:23:11Z  gs://datalab-chicago-taxi-demo/batch_prediction/prediction.results-00019-of-00022
 19.78 MiB  2017-10-28T04:23:11Z  gs://datalab-chicago-taxi-demo/batch_prediction/prediction.results-00020-of-00022
 19.73 MiB  2017-10-28T04:23:11Z  gs://datalab-chicago-taxi-demo/batch_prediction/prediction.results-00021-of-00022
TOTAL: 24 objects, 421338589 bytes (401.82 MiB)

We can load the results back to BigQuery.



In [10]:

    
%%bq load
format: json
mode: overwrite  
table: chicago_taxi.eval_results
path: gs://datalab-chicago-taxi-demo/batch_prediction/prediction.results*
schema:
  - name: unique_key
    type: STRING
  - name: predicted
    type: FLOAT
  - name: target
    type: FLOAT

With data in BigQuery can do some query analysis. For example, RMSE.



In [11]:

    
%%ml evaluate regression
bigquery: chicago_taxi.eval_results









    





          
          
          






    Out[11]:






  
    
      
      metric
      value
    
  
  
    
      0
      Root Mean Square Error
      3.474433
    
    
      1
      Mean Absolute Error
      1.531116
    
    
      2
      50 Percentile Absolute Error
      0.911206
    
    
      3
      90 Percentile Absolute Error
      2.882050
    
    
      4
      99 Percentile Absolute Error
      12.204504

From above, the results are better than local run with sampled data. RMSE reduced by 2.5%, MAE reduced by around 20%. Average absolute error reduced by around 30%.

Select top results sorted by error.



In [12]:

    
%%bq query
SELECT
  predicted, 
  target,
  ABS(predicted-target) as error,
  s.* 
FROM `chicago_taxi.eval_results` as r 
JOIN `chicago_taxi.eval` as s 
ON r.unique_key = s.unique_key 
ORDER BY error DESC
LIMIT 10









    





          
          
          






    Out[12]:





    predicted target error unique_key fare weekday day hour pickup_latitude pickup_longitude dropoff_latitude dropoff_longitude company
7.06353187561 197.770004272 190.706472397 357449c0af2cc90b5c49fb04087183ba7f90cea5 197.77 4 280 4 41.899602111 -87.633308037 41.899602111 -87.633308037 
8.17792129517 197.050003052 188.872081757 e2f0acebdc1d8ac7d998818e399e4b76505247e2 197.05 7 150 17 41.9867118 -87.663416405 42.009622881 -87.670166857 Taxi Affiliation Services
8.41388988495 195.850006104 187.436116219 3b03f0108c84bc2b4c42ba874b0cfdccd72e575e 195.85 6 318 20 41.97907082 -87.903039661 41.97907082 -87.903039661 Dispatch Taxi Affiliation
7.15682601929 192.850006104 185.693180084 76266a055557deb60189004a10d5a43f48843c86 192.85 1 222 3 41.9867118 -87.663416405 41.96581197 -87.655878786 Blue Ribbon Taxi Association Inc.
13.7977991104 199.050003052 185.252203941 a759a0491eef2661c12027d5dc5283b64124a18c 199.05 2 342 14 41.944226601 -87.655998182 42.009622881 -87.670166857 
5.72312879562 189.669998169 183.946869373 b0fb64578ac47490df8e94a223f0ec82613fa1f0 189.67 7 250 11 41.879066994 -87.657005027 41.879255084 -87.642648998 
12.2571325302 194.649993896 182.392861366 18136afd91cf91f58bb664b5fba4c9f33ebe46b4 194.65 2 327 9 41.97907082 -87.903039661 41.97907082 -87.903039661 
5.31241512299 187.0 181.687584877 1bc27016f87f8de32dfead8b282c996e4ff0ce15 187.0 1 136 3 41.892072635 -87.628874157 41.892507781 -87.626214906 Chicago Medallion Leasing INC
9.88418579102 188.050003052 178.165817261 a5c3dd3f3f8228eb0ca208e5e949d1e9401e7de5 188.05 3 181 23 41.922686284 -87.649488729 41.947791586 -87.683834942 
18.1057472229 196.050003052 177.944255829 1ac25e0730bbefa234752b6a2e6824296936cc82 196.05 4 134 21 41.878865584 -87.625192142 41.96581197 -87.655878786 Taxi Affiliation Services
    
(rows: 10, time: 14.0s,   610MB processed, job: job_nkhfAM3ZPP_jzDXv8H20S8LtC1xg)

There is also a feature slice visualization component designed for viewing evaluation results. It shows correlation between features and prediction results.



In [40]:

    
%%bq query --name error_by_hour
SELECT
  COUNT(*) as count,
  hour as feature,
  AVG(ABS(predicted - target)) as avg_error,
  STDDEV(ABS(predicted - target)) as stddev_error
FROM `chicago_taxi.eval_results` as r
JOIN `chicago_taxi.eval` as s 
ON r.unique_key = s.unique_key 
GROUP BY hour



In [44]:

    
# Note: the interactive output is replaced with a static image so it displays well in github.
# Please execute this cell to see the interactive component.

from google.datalab.ml import FeatureSliceView

FeatureSliceView().plot(error_by_hour)









    Out[44]:



In [42]:

    
%%bq query --name error_by_weekday
SELECT
  COUNT(*) as count,
  weekday as feature,
  AVG(ABS(predicted - target)) as avg_error,
  STDDEV(ABS(predicted - target)) as stddev_error
FROM `chicago_taxi.eval_results` as r
JOIN `chicago_taxi.eval` as s 
ON r.unique_key = s.unique_key 
GROUP BY weekday



In [45]:

    
# Note: the interactive output is replaced with a static image so it displays well in github.
# Please execute this cell to see the interactive component.

from google.datalab.ml import FeatureSliceView

FeatureSliceView().plot(error_by_weekday)









    Out[45]:

What we can see from above charts is that model performs worst in hour 5 and 6 (why?), and best on Sundays (less traffic?).

Model Deployment and Online Prediction

Model deployment works the same between locally trained models and cloud trained models. Please see previous notebook (Taxi Fare Model (small data)).

Cleanup



In [ ]:

    
!gsutil -m rm -rf gs://datalab-chicago-taxi-demo



In [ ]:

unique_key	fare	weekday	day	hour	pickup_latitude	pickup_longitude	dropoff_latitude	dropoff_longitude	company
2bc572255bcaa2389a211f282a7291916ed3da07	5.45	1	313	7	41.934762456	-87.639853859	41.936310131	-87.651562592
5d5fcecab5b21f369daab6c08f909fec5eef39f7	3.25	6	3	21	41.946294536	-87.654298084	41.946294536	-87.654298084
abb241ae6ed453a54fa9761f99b109272c0af770	10.65	7	74	13	41.89967018	-87.669837798	41.921778356	-87.641459759	Dispatch Taxi Affiliation
de3b3be03f4428b542c5abe8afd52545e72d9cbe	4.65	4	184	19	41.906025969	-87.675311622	41.906025969	-87.675311622	Taxi Affiliation Services
583aeed939da4b9cd39746eed19e96a1c8c0b185	17.25	3	41	12	41.808916283	-87.596183344	41.706587882	-87.623366512	Taxi Affiliation Services
c60a2ab89b7d57fec060b5a6b8fa1c141e1540bd	7.65	7	130	3	41.946294536	-87.654298084	41.921854911	-87.646210977	Taxi Affiliation Services
4ec0e84280d13fce4efb3a035297ac0aeb1fa5bb	5.05	1	321	0	41.921854911	-87.646210977	41.93057857	-87.642206313	Taxi Affiliation Services
7cd32a624a19e513615a142e6e485da0814c9022	7.05	7	53	3	41.965445784	-87.66319585	41.946294536	-87.654298084
34746e341940dce3f90e98873ed2d54cd5d1515d	28.05	1	54	2	41.946294536	-87.654298084	41.945069205	-87.67606274
549a108a9579646b1d1fb674866d93873bfa4875	5.45	6	291	19	41.965445784	-87.66319585	41.972667956	-87.663865496
c775f7599eccdecd422456099dea5714d1a8c2a7	8.65	1	118	1	41.928431564	-87.699968591	41.906025969	-87.675311622
f45daa290893f86198aae73365887dd5682ba4f8	4.25	1	54	4	41.958055933	-87.660389456	41.965445784	-87.66319585	Dispatch Taxi Affiliation
59ab7a8d5b76d5b1e319ab1fbdece667eb7e4b5c	8.85	7	75	21	41.921778356	-87.641459759	41.921273105	-87.68508211	Taxi Affiliation Services
5f32076aac90efddfe968b27215b1ce70c96837e	5.45	7	326	1	41.794090253	-87.592310855	41.794090253	-87.592310855
26be997dec6bbedbc4bf9c501ea721ecc9414ffc	4.45	7	144	23	41.950545696	-87.676182496	41.957843375	-87.676373281
79605fa02f425d36ab45f24648269ecae519c2a7	8.05	7	138	18	41.943155086	-87.640698076	41.935988906	-87.670966384
8f978edb00ada6b7ec748a61049730f2b34985a6	8.85	7	354	20	41.945170453	-87.668794439	41.972667956	-87.663865496	Blue Ribbon Taxi Association Inc.
f066faa6c6e190a7f0e46a30d46e8d5e9d4c09f3	14.05	6	45	19	41.9725808	-87.694001061	41.906025969	-87.675311622
8e294dd8e5423956becb52242b4e06947b24e1e7	4.65	1	200	2	41.949139771	-87.656803909	41.962178629	-87.645378762	Taxi Affiliation Services
e994354ae1e87314d5fe89c99b7b86c2ca120b2c	7.85	3	175	6	41.916005274	-87.675095116	41.972437081	-87.671109526
718786093b94c14f22b7d83932841826109e0608	5.65	1	145	3	42.009622881	-87.670166857	42.001571027	-87.695012589	Taxi Affiliation Services
1b32672d8892fa920b139c562e12e0e3cad08862	6.25	3	190	23	41.942691844	-87.651770507	41.929077655	-87.646293476	Choice Taxi Association
0af210da5de585e781b94d55b9c3137e536fd736	7.05	7	290	19	41.906025969	-87.675311622	41.87866742	-87.671653621
886e5c156e49a584f5709d7f02dbe0ed43917adb	13.05	5	331	10	41.878594358	-87.730232428	41.878594358	-87.730232428	Taxi Affiliation Services
b40f10ff61edc4de680001d11b314b39fbaa9629	6.25	6	101	12	41.829922304	-87.672502646	41.829922304	-87.672502646	Choice Taxi Association

unique_key	fare	weekday	day	hour	pickup_latitude	pickup_longitude	dropoff_latitude	dropoff_longitude	company
7bc601797a07c11ac351a49d02850789acda94b1	35.25	4	203	22	41.97907082	-87.903039661	41.890922026	-87.618868355	Taxi Affiliation Services
a2c7b99420515793e18c2cf896963edc1dbaafff	38.05	1	230	13	41.785998518	-87.750934289	41.949139771	-87.656803909	Taxi Affiliation Services
82a1dfb248c4cb6bdede9fbe6a50d2fd2d6043dc	8.25	4	16	8	41.899602111	-87.633308037	41.953582125	-87.72345239
d3131be93213ce27b6f5c895b7400b2b80bb48a5	8.65	5	276	20	41.904935302	-87.649907226	41.880994471	-87.632746489	Taxi Affiliation Services
b95cff84b984569fdab6a91ec8b0233656aafa3b	9.85	4	225	17	41.880994471	-87.632746489	41.849246754	-87.624135298	Taxi Affiliation Services
cd1641a6062ada8f3019500e5b71e8a6f5edca2d	46.65	5	127	13	42.001571027	-87.695012589	41.983636307	-87.723583185	Taxi Affiliation Services
94977a09ce63010d0edb5692beb4643161fb4a87	6.85	4	71	21	41.884987192	-87.620992913	41.867902418	-87.642958665	Taxi Affiliation Services
98fce58c13ae616b787a93a5a1df7156077a7306	24.45	5	220	20	41.785998518	-87.750934289	41.89321636	-87.63784421	Northwest Management LLC
d6288f216a2d82fbd43a311d1cf70260940dcd78	5.05	4	99	22	41.942691844	-87.651770507	41.93057857	-87.642206313
58fd7078712ed5a1cea6a87289212af5747c20ea	36.05	3	218	12	41.880994471	-87.632746489	41.97907082	-87.903039661	Taxi Affiliation Services
8aa56eec78409be1652a9b83a4f3e070731f92e9	4.84	7	54	0	41.890922026	-87.618868355	41.892507781	-87.626214906
827dbd61d7a4499fa2a61f385c4e8333ed823a5e	5.05	5	316	16	41.902788048	-87.62614559	41.90156691	-87.638404012
f38715ad77adcd5acf6845f710f0deaa5418cf25	13.05	5	135	21	41.878865584	-87.625192142	41.901206994	-87.676355989	Dispatch Taxi Affiliation
2fd1b1f9cd7ea5b2906436048020b29962fcbb10	4.84	2	33	14	41.884987192	-87.620992913	41.884987192	-87.620992913
fee9d6f95e140ce67caf1a886adaf9eff7c12453	10.05	5	23	19	41.880994471	-87.632746489	41.914585709	-87.645966207	Taxi Affiliation Services
913f60022795a4f743e244bad7d45de8cdd0724f	5.45	2	308	20	41.89321636	-87.63784421	41.89321636	-87.63784421	KOAM Taxi Association
3ab08ea6f7fa217191616d8a961250e7ff15c00f	4.45	6	177	11	41.880994471	-87.632746489	41.884987192	-87.620992913	Taxi Affiliation Services
a808620e2671265062b6532c0a4ae91fe0d9c237	7.25	6	200	17	41.922686284	-87.649488729	41.944226601	-87.655998182	Dispatch Taxi Affiliation
6bd8607cc0ffac63a0f87601bc14851a6c9924a5	15.05	1	221	20	41.899602111	-87.633308037	41.9867118	-87.663416405	Choice Taxi Association
ee2e429040b251cce7a4974b81f222c32746e16f	16.85	7	10	2	41.947791586	-87.683834942	41.874005383	-87.66351755
076b03e7d42357350d78c4cef875b6b41957cdba	7.75	7	191	23	41.877406123	-87.621971652	41.892072635	-87.628874157	Taxi Affiliation Services
4bb3b161ee381ddbc6ad3f1a2af88af575863f35	6.05	2	300	0	41.922686284	-87.649488729	41.944226601	-87.655998182	Taxi Affiliation Services
14f397b18ffb5a3e6f84b59d88bc6a655cc46409	13.45	5	255	8	41.914616286	-87.631717366	41.88528132	-87.6572332
91e572f071c93228b93fc9bef66f71c28eb7ecf2	6.05	3	344	9	41.892072635	-87.628874157	41.880994471	-87.632746489
f5636cb8d64a808666af7721c43279830d1d4e1b	9.25	6	207	18	41.899155613	-87.626210532	41.880994471	-87.632746489	Taxi Affiliation Services

	metric	value
0	Root Mean Square Error	3.474433
1	Mean Absolute Error	1.531116
2	50 Percentile Absolute Error	0.911206
3	90 Percentile Absolute Error	2.882050
4	99 Percentile Absolute Error	12.204504

predicted	target	error	unique_key	fare	weekday	day	hour	pickup_latitude	pickup_longitude	dropoff_latitude	dropoff_longitude	company
7.06353187561	197.770004272	190.706472397	357449c0af2cc90b5c49fb04087183ba7f90cea5	197.77	4	280	4	41.899602111	-87.633308037	41.899602111	-87.633308037
8.17792129517	197.050003052	188.872081757	e2f0acebdc1d8ac7d998818e399e4b76505247e2	197.05	7	150	17	41.9867118	-87.663416405	42.009622881	-87.670166857	Taxi Affiliation Services
8.41388988495	195.850006104	187.436116219	3b03f0108c84bc2b4c42ba874b0cfdccd72e575e	195.85	6	318	20	41.97907082	-87.903039661	41.97907082	-87.903039661	Dispatch Taxi Affiliation
7.15682601929	192.850006104	185.693180084	76266a055557deb60189004a10d5a43f48843c86	192.85	1	222	3	41.9867118	-87.663416405	41.96581197	-87.655878786	Blue Ribbon Taxi Association Inc.
13.7977991104	199.050003052	185.252203941	a759a0491eef2661c12027d5dc5283b64124a18c	199.05	2	342	14	41.944226601	-87.655998182	42.009622881	-87.670166857
5.72312879562	189.669998169	183.946869373	b0fb64578ac47490df8e94a223f0ec82613fa1f0	189.67	7	250	11	41.879066994	-87.657005027	41.879255084	-87.642648998
12.2571325302	194.649993896	182.392861366	18136afd91cf91f58bb664b5fba4c9f33ebe46b4	194.65	2	327	9	41.97907082	-87.903039661	41.97907082	-87.903039661
5.31241512299	187.0	181.687584877	1bc27016f87f8de32dfead8b282c996e4ff0ce15	187.0	1	136	3	41.892072635	-87.628874157	41.892507781	-87.626214906	Chicago Medallion Leasing INC
9.88418579102	188.050003052	178.165817261	a5c3dd3f3f8228eb0ca208e5e949d1e9401e7de5	188.05	3	181	23	41.922686284	-87.649488729	41.947791586	-87.683834942
18.1057472229	196.050003052	177.944255829	1ac25e0730bbefa234752b6a2e6824296936cc82	196.05	4	134	21	41.878865584	-87.625192142	41.96581197	-87.655878786	Taxi Affiliation Services